Data Mining Meets HCI: Making Sense of Large Graphs

نویسندگان

  • Duen Horng
  • Chau
چکیده

We have entered the age of big data. Massive datasets are now common in science, government and enterprises. Yet, making sense of these data remains a fundamental challenge. Where do we start our analysis? Where to go next? How to visualize our findings? We answers these questions by bridging Data Mining and HumanComputer Interaction (HCI) to create tools for making sense of graphs with billions of nodes and edges, focusing on: (1) Attention Routing: we introduce this idea, based on anomaly detection, that automatically draws people’s attention to interesting areas of the graph to start their analyses. We present three examples: Polonium unearths malware from 37 billion machine-file relationships; NetProbe fingers bad guys who commit auction fraud. (2) Mixed-Initiative Sensemaking: we present two examples that combine machine inference and visualization to help users locate next areas of interest: Apolo guides users to explore large graphs by learning from few examples of user interest; Graphite finds interesting subgraphs, based on only fuzzy descriptions drawn graphically. (3) Scaling Up: we show how to enable interactive analytics of large graphs by leveraging Hadoop, staging of operations, and approximate computation. This thesis contributes to data mining, HCI, and importantly their intersection, including: interactive systems and algorithms that scale; theories that unify graph mining approaches; and paradigms that overcome fundamental challenges in visual analytics. Our work is making impact to academia and society: Polonium protects 120 million people worldwide from malware; NetProbe made headlines on CNN, WSJ and USA Today; Pegasus won an opensource software award; Apolo helps DARPA detect insider threats and prevent exfiltration. We hope our Big Data Mantra “Machine for Attention Routing, Human for Interaction” will inspire more innovations at the crossroad of data mining and HCI.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SHIFTR: A Fast and Scalable System for Ad Hoc Sensemaking of Large Graphs

We present SHIFTR, a system that assists users in making sense of large scale graph data. Making sense of information represented as large graphs is a fundamental challenge in many data-intensive domains. We suggest the potential of strong synergies between the data mining, cognitive psychology, and HCI communities in matching powerful graph mining tools with insights into how people learn and ...

متن کامل

Trends in Interactive Knowledge Discovery for Personalized Medicine: Cognitive Science meets Machine Learning

A grand goal of future medicine is in modelling the complexity of patients to tailor medical decisions, health practices and therapies to the individual patient. This trend towards personalized medicine produces unprecedented amounts of data, and even though the fact that human experts are excellent at pattern recognition in dimensions of ≤ 3, the problem is that most biomedical data is in dime...

متن کامل

Machine Learning & Knowledge Extraction (MAKE) for Health Informatics: Towards educating a new kind of graduates

My teaching in the last years revolved around Machine Learning & Knowledge Extraction (MAKE) with the application on Health Informatics. Machine Learning (ML) studies algorithms which can learn from data to extract knowledge from experience and to make decisions and predictions. Health Informatics studies the effective use of probabilistic information for decision making. However, successful ap...

متن کامل

ROC Graphs: Notes and Practical Considerations for Data Mining Researchers

Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communities. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls w...

متن کامل

تحلیل تراکنش‌های امانت و گردش منابع کتابخانه‌های دانشگاه علوم پزشکی بیرجند با الگوریتم‌های داده‌کاوی

Introduction: Data mining is a process for discovering meaningful relationships and patterns from data. Identify behavior patterns of libraries users can helps improve decision-making in libraries. This study aimed to analyze the interlibrary loan transactions in Birjand University of Medical Sciences using data mining algorithms. Methods: In this descriptive study, knowledge discovery and d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012